Thread: modify char array in function || reading lines from file

  1. #1
    Registered User
    Join Date
    Oct 2013
    Posts
    87

    modify char array in function || reading lines from file

    I read in file with four columns skipping line one. File is tab/space delimited, my goal is to get the last column.

    I use function where parameter is char array. The function is OK as long it is working, however, it doesn't move the pointer of Char array to new location.

    I cannot understand why. Or, how to move char pointer to location. It prints complete line as 1 startcoord, end coord and gene.

    In the function, the gene name is printed fine, that is pointer is moved correctly.


    Code:
    #include <stdio.h>
    #include <stdlib.h>
    #include <malloc.h>
    #include <string.h>
    
    
    #include "funcs_combinations.h"
    
    
    #define MAX_LEN 200
    
    
    void get_gene_name(char temp_line[])
    {
        int space_count = 0; //get location of last space
        size_t itr = 0; //iterate
        size_t len_line = strlen(temp_line);
    
    
        while (itr < len_line)
        {
            if ((temp_line)[itr] == ' ' || (temp_line)[itr] == '\t')
            {
                space_count = itr; //store space location
            }
            itr++;
        }
        printf("space location was %d\n", space_count);
        temp_line = space_count + temp_line; //move pointer 
        printf("moved location is %d\n", temp_line);
        printf("location is %s\n", temp_line); //printing works fine here but goes bad in returning
        // return (temp_line);
    }
    //////////////////////////////
    
    
    void remove_trailingspaces(char *newlines)
    {
    
    
    /*
    This function is called while reading the first line
    
    
    */
        size_t length = strlen(newlines);
        size_t i = 0;
        for (i = 0; i < length; i++)
        {
            if (newlines[i] == '\n')
            {
                newlines[i - 1] = '\0';
                break;
            }
        }
    }
    
    
    
    
    //////////////////////////////
    int main(int argc, char *argv[])
    {
        FILE *fptr;
        char c[MAX_LEN];
        int line_number = 0;
    
    
        fptr = fopen(argv[1], "r");
    
    
        if (fptr == NULL)
        {
            printf("null pointer for file opening\n");
        }
    
    
        while (fgets(c, MAX_LEN, fptr))
        {
            if (line_number > 1) //make sure header is read 
            {
                if (strcmp(c, "\n") != 0)
                {
                    remove_trailingspaces(c);
                    get_gene_name(c);
                    printf("new gene is %s\n", c); // it doesn't move to new location 
                    add_node(&start, c); //only if not new line
                }
            }
    
    
            line_number++;
        }
    
    
        fclose(fptr);
    
    
        return 0;
    }
    I have input file with lines as:

    CHR START END GENE
    1 1234 546 GENE1
    1 2234 5346 GENE12
    1 4234 5246 GENE14
    1 6234 5546 GENE16
    I want to get last column, that is gene name. GENE1, GENE12 and such from function get_gene_name.
    Edit: itr for get_gene_name (while loop )
    Last edited by deathmetal; 03-07-2021 at 07:29 PM. Reason: incorrect iteration

  2. #2
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    To understand the problem, let's go back to a simple example:
    Code:
    #include <stdio.h>
    
    void foo(int x)
    {
        x = 123;
    }
    
    int main(void)
    {
        int y = 0;
        foo(y);
        printf("%d\n", y);
        return 0;
    }
    As you probably can tell from reading the above code, the output of the above program is 0, not 123. The reason of course is that the assignment of 123 to x in foo only affects the local variable, not the variable from the caller. Now let's introduce a pointer:
    Code:
    #include <stdio.h>
    
    void foo(int *x)
    {
        *x = 123;
    }
    
    int main(void)
    {
        int y = 0;
        foo(&y);
        printf("%d\n", y);
        return 0;
    }
    Now, the output is indeed 123, because by assigning 123 to *x, what x points to, i.e., y, is modified. Let's go back to your code. As you may know, this:
    Code:
    void get_gene_name(char temp_line[])
    is equivalent to:
    Code:
    void get_gene_name(char *temp_line)
    So let's go back to the pointer example, but make it something like what you did in your code:
    Code:
    #include <stdio.h>
    
    void foo(int *x, int *p)
    {
        x = p;
    }
    
    int main(void)
    {
        int y = 0;
        int value = 123;
        foo(&y, &value);
        printf("%d\n", y);
        return 0;
    }
    Once again, the output is 0 instead of 123. The reason brings us back to the first example: the assignment of p to x only affects the local variable, not the variable from the caller.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  3. #3
    Registered User
    Join Date
    Oct 2013
    Posts
    87
    Thank you.

    I guessed it could be an issue with local and global variable.
    Now, I provide another array, copy into it and it works.


    Code:
    #include <stdio.h>
    #include <string.h>
    
    
    void get_gene_name(char temp_line[], char *temp_copy)
    {
        int space_count = 0; //get location of last space
        size_t itr = 0; //iterate
        size_t len_line = strlen(temp_line);
    
    
    
    
        while (itr < len_line)
        {
            if ((temp_line)[itr] == ' ' || (temp_line)[itr] == '\t')
            {
                space_count = itr; //store space location
            }
            itr++;
        }
        printf("space location was %d\n", space_count);
        temp_line = space_count + temp_line; //move pointer 
        printf("moved location is %d\n", temp_line);
        printf("location is %s\n", temp_line); //printing works fine here but goes bad in returning
        
        temp_copy[0]='\0';
        strcpy(temp_copy,temp_line);
        // return (temp_line);
    }
    //////////////////////////////
    int main(void)
    {
    
    
    char copied[30];
        char length[80]="value is nuts with";
        get_gene_name(length, copied);
        //foo(&y, &value);
        printf("new value is %s\n",copied);
        
        return 0;
    }
    I have following two concerns:

    1) is copying OK? I am looking at several genes (20-50K+)

    2) How do I not run into problem of local-global in future? What I mean to ask is what is the rule of thumb when change is going to be made to the variable from caller and otherwise?

  4. #4
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by deathmetal
    1) is copying OK? I am looking at several genes (20-50K+)
    It is not wise to have the destination array smaller than the source array.

    But you may not need an auxiliary array, e.g., I think you might want to do something like this instead:
    Code:
    void trim_whitespace(char *text)
    {
        size_t len = strlen(text);
        if (len == 0)
        {
            return;
        }
    
        char *first_non_space = text;
        while (isspace(*first_non_space))
        {
            ++first_non_space;
        }
    
        char *last_non_space = text + len - 1;
        while (first_non_space < last_non_space && isspace(*last_non_space))
        {
            --last_non_space;
        }
    
        while (first_non_space <= last_non_space)
        {
            *text++ = *first_non_space++;
        }
        *text = '\0';
    }
    I suggest this because:
    • From your post #1, you want to trim whitespace from the end.
    • get_gene_name seems to only be trimming whitespace from the front to get the gene get_gene_name
    • You can do an in-place copy, but not with strcpy because strcpy is not permitted to have overlapping arguments (because depending on how it is implemented, an overlap could result in a bug).
    • You could use memmove, which does allow overlapping arguments, but in this case it looks like it may be simpler to just implement the copy directly such that a bug due to overlap is avoided.

    If your actual data is such that trimming whitespace from the front is uncommon, then you can make an optimisation to check if first_non_space == text, and if so, you only set:
    Code:
    *(last_non_space + 1) = '\0';
    EDIT:
    There's yet another possibility for optimisation though, which I realised I should mention after answering your second question: we could go back to your idea of pointer arithmetic for trimming whitespace from the start, but return the pointer instead of modifying the array, and only modify the array to trim whitespace from the end. In that case, you could do this:
    Code:
    char *trim_whitespace(char *text)
    {
        size_t len = strlen(text);
        if (len == 0)
        {
            return text;
        }
    
        char *first_non_space = text;
        while (isspace(*first_non_space))
        {
            ++first_non_space;
        }
    
        char *last_non_space = text + len - 1;
        while (first_non_space < last_non_space && isspace(*last_non_space))
        {
            --last_non_space;
        }
    
        *(last_non_space + 1) = '\0';
        return first_non_space;
    }
    To use this, you would do something that amounts to this:
    Code:
    char text[80];
    // populate text
    // ...
    char *result = trim_whitespace(text);
    // use result instead of text
    It is arguably a little error-prone since a programmer using this function might expect it to change the text array in-place, so you'll have to document it carefully to avoid a bug being introduced under maintenance.

    Quote Originally Posted by deathmetal
    2) How do I not run into problem of local-global in future? What I mean to ask is what is the rule of thumb when change is going to be made to the variable from caller and otherwise?
    That's easy: when you assign to a parameter, ask yourself if you want the change to be reflected in the caller. If you do, then that's wrong, even if the parameter is a pointer. Rather, you need a pointer (and hence a pointer to a pointer if the parameter is already a pointer) so that you can dereference it, or you need to find some other way to accomplish the task, e.g., return a value.
    Last edited by laserlight; 03-07-2021 at 09:10 PM.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  5. #5
    Registered User
    Join Date
    Feb 2019
    Posts
    1,078
    One question: Since the lines (except the header) are fixed as
    <int> <int> <int> <string>
    Where string has no spaces, Why not read a line and use sscanf() to get these values?

    Code:
    char buffer[MAX_BUFSIZE];
    
    // Ignore first line
    if ( ! fgets( buffer, MAX_BUFFSIZE, fin ) )
    { ... error reading header ...}
    
    while ( fgets( buffer, MAX_BUFFSIZE, fin ) )
    {
      if ( sscanf( buffer, "%d %d %d %s", &v1, &v2, &v3, s ) != 4 )
      { ... error reading data... }
    
      // process data
    }

  6. #6
    Registered User
    Join Date
    Oct 2013
    Posts
    87
    hi laserlight,
    the code (trim_whitespace(char *text) with *text = '\0'; in the end ) you shared doesn't work on my end.

    Doesn't work mean I don't get the gene name, but complete line.
    It could be due to space and new line character together at the end of line, as I copy pasted columns from an excel.

    However, I used logic and modified my code:

    Code:
    void get_gene_name(char temp_line[])
    {
        int space_count = 0;
        size_t itr = 0;
        size_t len_line = strlen(temp_line);
    
    
        while (itr < len_line)
        {
            if ((temp_line)[itr] == ' ' || (temp_line)[itr] == '\t')
            {
                space_count = itr;
            }
            itr++;
        }
    
    
        itr = 0;       //iterate again
        space_count++; //start from next position
        while (space_count < len_line)
        {
            temp_line[itr++] = temp_line[space_count++]; //set character
        }
        temp_line[itr] = '\0';
    }
    This works fine and as intended. Please let me know how to optimize and improve on the code if possible.

    Thank you

  7. #7
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    My apologies, I misread both your description and your code: you're not doing a whitespace trim, but parsing to extract a field from a space-separated format. flp1969 in post #5 is right: for your particular parsing use case, fgets + sscanf would be better. You can skip storing the fields that you don't need by making use of * to suppress assignment, and then you just need to specify the field width for the field that you want to extract to avoid buffer overflow.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  8. #8
    Registered User
    Join Date
    Oct 2013
    Posts
    87
    Quote Originally Posted by laserlight View Post
    My apologies, I misread both your description and your code: you're not doing a whitespace trim, but parsing to extract a field from a space-separated format. flp1969 in post #5 is right: for your particular parsing use case, fgets + sscanf would be better. You can skip storing the fields that you don't need by making use of * to suppress assignment, and then you just need to specify the field width for the field that you want to extract to avoid buffer overflow.
    No problems.

    I will try with #5.
    If you get chance can you please review my code I put in my last post?

    Thank you laser!

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 4
    Last Post: 11-27-2013, 12:24 PM
  2. Trying to modify array in a shared library function
    By JulietBoy in forum C Programming
    Replies: 8
    Last Post: 05-24-2013, 05:39 AM
  3. Replies: 2
    Last Post: 05-04-2013, 04:29 PM
  4. modify function pass it array
    By a.mlw.walker in forum C Programming
    Replies: 12
    Last Post: 08-01-2011, 04:03 AM
  5. Reading lines from a File
    By DivineSlayer936 in forum C++ Programming
    Replies: 12
    Last Post: 04-02-2007, 12:12 PM

Tags for this Thread